Biurrun Manresa et al. BMC Neuroscience 2013, 14:1 10 
http://www.biomedcentral.com/1471-2202/14/110 



Neuroscience 



RESEARCH ARTICLE Open Access 



Probabilistic model for individual assessment of 
central hyperexcitability using the nociceptive 
withdrawal reflex: a biomarker for chronic low 
back and neck pain 

Jose A Biurrun Manresa 1 *, Giang P Nguyen 1 , Michele Curatolo 1,2 , Thomas B Moeslund 3 and Ole K Andersen 1 



Abstract 

Background: The nociceptive withdrawal reflex (NWR) has been proven to be a valuable tool in the objective 
assessment of central hyperexcitability in the nociceptive system at spinal level that is present in some chronic pain 
disorders, particularly chronic low back and neck pain. However, most of the studies on objective assessment of 
central hyperexcitability focus on population differences between patients and healthy individuals and do not 
provide tools for individual assessment. In this study, a prediction model was developed to objectively assess 
central hyperexcitability in individuals. The method is based on statistical properties of the EMG signals associated 
with the nociceptive withdrawal reflex. The model also supports individualized assessment of patients, including an 
estimation of the confidence of the predicted result. 

Results: up to 80% classification rates were achieved when differentiating between healthy volunteers and chronic 
low back and neck pain patients. EMG signals recorded after stimulation of the anterolateral and heel regions and 
of the sole of the foot presented the best prediction rates. 

Conclusions: A prediction model was proposed and successfully tested as a new approach for objective 
assessment of central hyperexcitability in the nociceptive system, based on statistical properties of EMG signals 
recorded after eliciting the NWR. Therefore, the present statistical prediction model constitutes a first step towards 
potential applications in clinical practice. 

Keywords: Nociceptive withdrawal reflex, Chronic pain, Biomarker, Machine learning, Pattern recognition, 
EMG classification 



Background increased excitability in the central processing of sensory 

Chronic pain states are associated with changes in the input [2,3]. 

nociceptive system that may lead to hypersensitivity, i.e., The assessment of these conditions may be hampered by 

pain after innocuous stimulation or exaggerated pain the subjective and unreliable nature of self-report based 

after low- intensity nociceptive stimulation [1]. Patients instruments. The establishment of objective, affordable and 

with chronic pain syndromes, such as whiplash, fibromyal- reliable measures of pain hypersensitivity would advance 

gia, osteoarthritis, basal ganglia disorders, migraine and the understanding of neural mechanisms behind chronic 

tension-type headache, endometriosis or chronic low pain, provide a basis for improved clinical management of 

back pain may display such pain hypersensitivity after pain, and establish much needed objective measures 

stimulation of healthy tissues, most likely resulting from of treatment success or failure [4]. 

In this regard, the nociceptive withdrawal reflex (NWR) 
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in the assessment of physiological, chemical and pharmaco- 
logical modulation of nociceptive transmission/processing 
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[5-9]. Moreover, it has been shown that the NWR is 
a valuable tool in the objective assessment of central 
hyperexcitability in the spinal nociceptive system that 
is present in many chronic pain disorders [7,9]. However, 
most of the studies on objective assessment of central 
hyperexcitability focus on population differences between 
patients and healthy individuals and do not provide tools 
for individual assessment [7,8]. 

The aim of this study was to develop a method to 
provide objective and individual assessment of central 
hyperexcitability using a biomarker derived from the 
NWR. In order to accomplish this, data from chronic 
low back and neck pain patients and healthy subjects was 
used to construct and test a prediction model based on 
statistical properties of the NWR signals. The methods 
and results of this model are presented and the relevance 
of such biomarker is discussed in the context of individual 
assessment of central hyperexcitability in clinical settings. 

Methods 

Participants 

Data from 280 subjects was collected and divided in two 
groups. One group contained data from 140 patients 
(70 males and 70 females) with chronic low back pain 
or chronic neck pain. Inclusion criteria for chronic 
pain patients were: daily pain of at least 6 months 
duration and pain at the time of testing with an intensity 
of at least 3 using a 10-cm visual analogue scale (VAS), 
whereby 0 = no pain and 10 = worst pain imaginable. 
Exclusion criteria for chronic pain patients were: radicular 
pain (as defined by leg pain associated with a magnetic 
resonance imaging finding of herniated disc or foraminal 
stenosis with contact to a nerve root); peripheral or 
central neurological disorders, diabetes mellitus, insufficient 
knowledge of the German language, pregnancy (as ruled 
out by pregnancy test), breast feeding, intake of oral contra- 
ceptives or hormones, intake of opioids and antidepressants 
during the previous 2 weeks, and intake of other analgesics 
during the 48 hours before testing. The second group 
contained data from 140 healthy subjects (70 males and 70 
females). Exclusion criteria for healthy subjects were the 
same as for the patient group, plus any pain at the time of 
testing or history of chronic pain syndrome of any nature. 
Both groups were evenly distributed with respect to gender 
and age. The age of the subjects ranged from 20 to 80 years 
with a mean age of 50 years. Both groups were tested using 
the same experimental setup for the recording of the NWR. 
All subjects were recruited at the University Hospital of 
Bern, Inselspital. The dataset was originally collected in 
order to establish reference values for the NWR and reflex 
receptive fields (RRF, the area from which the NWR can be 
elicited) in healthy subjects, and to determine if there are 
population differences compared to chronic pain patients 
(for further details, please refer to [9-12]). Written informed 



consent was obtained prior to participation and the 
Declaration of Helsinki was respected. The study was 
approved by the local ethics committee of the Canton 
of Bern (KEK 147/04). 

Setup 

Electrical stimulation 

During testing, participants were lying in a bed, in a quiet 
room. A leg rest was placed under the knees to obtain a 30° 
semiflexion during electrophysiological testing. Ten surface 
stimulation electrodes (15 x 15 mm, type Neuroline 700, 
Ambu A/S, Denmark) were non-uniformly mounted on 
the sole of the foot and a common anode (5x9 cm) was 
placed on the dorsum of the foot, which are subsequently 
denoted as sites 1,2,... ,10. Each stimulus consisted of a 
constant-current pulse train of 5 individual 1-ms pulses 
delivered at 200 Hz (Noxitest IES 230 stimulator, Aalborg, 
Denmark). This stimulus is perceived by the subjects as a 
single pricking/pinching stimulus, and potentially evokes a 
single ankle dorsiflexion/plantarflexion withdrawal reaction 
(depending on the stimulation site). A computer-controlled 
stimulator delivered a stimulus to one electrode at a time in 
a randomized order, with a random inter-stimulus interval 
ranging from 10 to 15 s. For each electrode site, the lowest 
stimulus intensity that evoked pain (i.e., the pain threshold) 
was assessed using a staircase procedure, and a stimulation 
intensity of 1.5 times the pain threshold was selected for 
eliciting the NWR [13], in order to ensure that a 
NWR response was evoked by most of the stimulations. 
Each electrode site was stimulated 4 times. 

Signal recording 

Activity in the tibialis anterior muscle was measured using 
surface electromyography (EMG), as can be seen in 
Figure l.A. Initially the skin was lightly abraded, and then 
two surface electrodes (30 x 22 mm, type Neuroline 720, 
Ambu A/S, Denmark) were placed along the muscle fiber 
direction over the muscle with an inter-electrode distance 
of 20 mm. The signal was amplified (up to 20000 times) 
and filtered (5-500 Hz, 2nd order) by custom-made EMG 
amplifiers (Aalborg University, Denmark). Afterwards, it 
was sampled (2000 Hz) and stored (1000 ms window 
including 200 ms of pre-stimulation activity, commonly 
used to verify that there is no muscle activity before 
the stimulation), using a software specifically developed 
for NWR acquisition and analysis [14]. The NWR was 
quantified within the 60-180 ms post-stimulation interval 
(Figure LB). 

Prediction model 
Data preparation 

Data was divided into three disjoint subsets: training set 
(TR), validation set (V) and test set (TE) [15-17]. The 
training set TR contains the data from which the model 
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(A) (B) 



Time window of interest 




Figure 1 Methodology for NWR stimulation and recording in humans. (A) Reflex responses evoked by distributed electrical stimulation on 
the sole of the foot were recorded by surface EMG at selected muscles. (B) The reflex size was quantified in the time windows of interest 
(usually 60-180 ms after stimulation). 



was derived. The validation set V was used to adjust and 
find the optimal model. Finally, the test set TE was used 
to assess the performance of the optimal model. Data 
was split as 70%-15%-15% of the whole dataset for TR 
(200 subjects), V (40 subjects) and TE (40 subjects) sets, 
respectively. 

Two restrictions were applied when assigning subjects 
into each set. First, the number of patients and healthy 
subjects should be balanced. Second, subjects within each 
group (patient or healthy) should be distributed evenly 
with respect to gender and age. Assuming a data vector of 
N samples X = {X;} i= jjv horn which n samples need to 
be selected following the aforementioned restrictions; two 
age classes: A l = {X t < 50y\ and A 2 = {X t > 50y}, and two 
genders Gi = {X, is male} and Gi = {X, is female} were in- 
cluded. The set S was derived as: 

S = SiuS 2 uS 3 uS4 

where 

Si =A 1 nG 1 S 2 =A 1 nG 2 
S 3 = A 2 nGi S 4 = A 2 nG 2 

and IIS;||, = i..4 = K / 4 . Following these restrictions, the data 
was finally divided as shown in Table 1. 

Feature selection 

Several features were derived from the EMG recordings 
of the NWR in order to perform an initial exploratory 
analysis of their discriminative capacity. Preliminary tests 



showed promising results using the EMG amplitudes 
of the NWR, in line with previous investigations 
[18-21]. Specifically, the exploratory analysis showed 
some differences between the probability distributions 
of EMG amplitudes between patients and healthy sub- 
jects, so it was hypothesized that those differences could 
be used for classification purposes. 

Since the stimulation was repeated four times at each 
site, each sample X ; eX had four signals FyJ =1,4. For 
each signal, a probability distribution histogram was 
constructed to be used as classification feature. To 
compute a probability distribution histogram, it was 
required to determine the number of sub-ranges or 
bins that should be used. To this end, "EMG amplitudes 
of all signals {F^ from all training samples X,- were taken 
into account," where X, e TR. The EMG amplitude range 
was defined as Rg= [min (Fk), max (F,y)]. Since the EMG 
signal values have rather large range Rg, the selected 

Table 1 Data preparation 



Healthy Patient 





A, 


A 2 


G, 


G 2 


A, 


A 2 


G, 


G 2 


TR 


50 


50 


50 


50 


50 


50 


50 


50 


V 


10 


10 


10 


10 


10 


10 


10 


10 


TE 


10 


10 


10 


10 


10 


10 


10 


10 



All 140 patients and 140 healthy subjects were divided into a training set (TR), 
a validation set (V) and a test set (TE). Each set satisfied two criteria: balance 
between patients and healthy subjects, and even distribution on gender 
and age. 
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number of bins should be reasonably high. Therefore, dif- 
ferent numbers of bins, namely 100, 150, 200 and 300 
were tested. The experiments showed that in all the 
cases, there were very few extreme EMG amplitude 
values, in the order of ~1/1000, which means that the 
probability distribution of EMG amplitudes in the 
area of the two ends of Rg is ~0. The data was 
mainly distributed within the range rg = [fi-a, fi + a], 
where fi and a denote the mean and the standard devi- 
ation of {Fy}, respectively, so this new, restricted range was 
used. With the new range rg and selecting the model using 
a leave-one-out cross-validation procedure [22], only 30 
bins were required to construct a probability distribution 
histogram, effectively reducing the number of features 
to be used as input to the classifier. Figure 2 shows 
the average histogram (an estimate of the probability 
distributions of EMG amplitudes) for patients and 
healthy volunteers. 

Prediction scheme 

Given a set of training subjects {X,- e TR}, which contain TV* 
and A?" training samples from healthy subjects and patients 
respectively, probability distributions P,y were computed for 

each signal Fqj = 1, 4 of a subject X;, i = 1, N h + N p . The 
subject X ; was represented by the probability P„ where: 



P, 



E4 



X,- was assigned with a label L t = h if X, was a healthy 
subject and L 2 = p otherwise. When a query subject was 
sent to the prediction model to be classified as a patient 
or a healthy subject, the decision was made based on 
nearby training subjects with respect to the query sub- 
ject. This approach is known as the /c-nearest neighbour 
(kNN) [23]. kNN is an efficient algorithm in machine 
learning, showing comparable classification performance 
to more complex algorithms [16,24]. Briefly, when given 



a query subject, the kNN algorithm searches in the 
training set for the k subjects that are closest to the 
query, based on some predefined criterion to measure 
closeness (e.g. Euclidean distance). The query subject is 
then assigned to the group which has the majority 
among k subjects. Regarding the value of k, there is 
no specific approach for selecting an optimal value, as 
this strongly depends on the data structure [25]. Using 
leave-one-out cross-validation [22], a k value of 5 was 
finally selected. 

The proposed prediction scheme was implemented as 
follows: given an unknown subject X q {X q e V or X q e TE), 
a probability distribution P q j was computed for each 
signal F q jJ= 1,4. Then, each probability distribution 
Pqj was compared with all training probability distri- 
butions Pi,i= l,N h +N p . The Euclidean distance be- 
tween these distributions was calculated as: 



dij 



Where k training subjects with smallest distance were 
considered, and x = 1 , 30 represents the size of each 
classification feature (i.e. the 30 bins of each histogram). 
If the majority of these k training subjects had label h 
then the signal F^ was labelled as healthy, and vice 
versa. Therefore, X q had four predicted labels, one for 
each of the four signals Fqj. The final prediction result 
for the unknown subject X q was decided based on a vot- 
ing mechanism, where the majority of votes was chosen 
to classify the subject as belonging to a patient or a 
healthy group. With four signals, there were cases where 
voting result was equal, i.e. 50:50 prediction results. 
From a clinical perspective, it is better to minimize the 
chance of missing any potential patients. Therefore, 
in case of equal voting, the unknown subject X q was 
classified as a patient. 




o.o 



■ ■ ■ I 
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Figure 2 Average histogram of EMG signals from all patients and healthy subjects. Error bars represent standard deviation 
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Individual assessment 

The prediction model also allows for individual assess- 
ment; as mentioned in the previous section, each signal 
Fqj of an unknown subject X q returned a predicted 
label, according to the majority of k nearest neigh- 
bours. This label can be described as a probability 
value. The following formulas were used to compute 
the probability value for each signal F q j to be labelled 
as patient or healthy, respectively: 



?■ 



k 

\L k = h\ 



Where L denotes the labels of k nearest neighbours at 
the y'-th signal. For example, for the ;'-th signal, if all k 
nearest neighbours had the same label p, then that 
particular signal F q j was labelled p with a probability of 
100%. Another example: if / 5 neighbours are labelled p 
and 2 / 5 neighbours are labelled h, then the signal was 
labelled p with probability of ~60%. In other words, the 
signal F^ had ~60% probability of belonging to a 
chronic pain patient and ~40% probability of belonging 
to a healthy subject. With four signals for each subject, 
the individual assessment can be obtained by: 



2& 

—2L(o/ n 



If F is higher or equal to 50%, the query subject X q is 
finally classified as patient, otherwise it is classified as 
healthy. Furthermore, the above value indicates how 
likely a subject is predicted as a patient: higher values 
result in higher confidence in the assessment. Figure 3 
shows a general overview of the proposed prediction 
scheme. 

Model validation 

The validation set V was used to find an optimal 
combination of different factors in order to achieve 
the highest prediction rate. As mentioned in the Data 
preparation section, 40 subjects (of which 20 are 
healthy subjects and 20 are patients) were included in V. 
Further experiments to validate the prediction model were 
conducted with different number of training subjects, 
since this number can influence the performance of the 
model [10]. Out of the 200 training subjects (100 patients 
and 100 healthy subjects), different subsets were extracted 
as a new training sets. 1st 1 and IsF denote the number of se- 
lected training healthy subjects and patients, respectively. 
Since the training set had to follow the rules established in 



the Data preparation section, TV* and tf were selected 
such that they were modulus 4: 

{N\,N{) = (12,12) {N\,N P 2 ) = (24,24) 
(A&ATf) = (32,32) (N h 4 ,N p 4 ) = (48,48) 
(N h 5 ,N» 5 ) = (60,60) (N h 6 ,N%) = (80,80) 
(N h 7 ,N p 7 ) = (100,100) 

When Nj andAff < 100, more than one combination 
among training subjects to select a subset were available. 
For the model validation, 10 combinations were ran- 
domly chosen for each case. It should be noted that all 
combinations should obey the even distribution restric- 
tions. Therefore, for each set of values (Nf,N p ^j ,1= 1,6, 
10 training sets were collected, namely TR l t ,t= 1,10. 
The same validation set V was used in all cases to test 
the prediction model. 

Model evaluation 

Different parameters might affect the performance of 
the prediction model, such as selection of training 
set, number of training subjects and stimulation sites 
(see next section). The validation set V was first used 
to tune these parameters. Once the model was optimized, 
the test set TE was used to evaluate the real performance 
of the model, without further parameter tuning. Since it is 
known in advance which subjects belong to the healthy 
group and which ones belong to the patient group, this 
knowledge was used to evaluate the prediction model. 
The evaluation of the model's performance was based on 
prediction rates: 

M+ , . 

Where M + denotes the number of correct classified 
subjects among total number of subjects for validation 
or test, respectively, i.e. M= ||v|| or M = TE depending 
on whether the validation set or the test set was used. 
Higher prediction rates indicate better performance of 
the model. To avoid cases where the prediction rate was 
high but most of the query subjects were classified as 
either patient or healthy, the previous equation was also 
applied to each group separately. In other words, it was 
extended as 



„ = -^(%)and^ = -^ 



Where denotes the number of healthy subjects 
which were correctly classified and M+ denotes the 
number of patients which were correctly classified. 
Mh and M p are the total number of healthy subjects 
and patients for validation or test, respectively. 
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Figure 3 Scheme of the proposed probabilistic prediction model. (A) Given a query subject X q e TE, a set of EMG signals F^J = 1,4 are 
obtained as a response to repeated electrical stimulation often sites on the sole of the foot. (B) A probability distribution histogram P qj is 
constructed from each signal F ql (or combination of signals from multiple sites) to be used as classification feature. (C) The signal F q j is labelled p 
(for patient) or h (for healthy), depending on the distances d qj to the closest neighbouring histograms P,, derived from the set of training subjects 
{X, eTR). (D) The final prediction for the subject X q is carried out based on the labels Iqj derived from the individual assessment of all four signals. 
Query subjects X^eV were used instead for all validation procedures (site combination and training set selection). 



Results 

Model validation 
Stimulation site evaluation 

Several factors can influence the EMG signals recorded 
after stimulation of each site depending on the location of 
the electrode on the sole of the foot, such as skin thick- 
ness or nerve fiber density [24]. Eventually, sites with low 
prediction rates should be eliminated, meaning that fewer 
electrodes will be needed to place on a subject's sole of the 
foot in a potential future application of the NWR as a 
biomarker for individual assessment of pain. Therefore, 
the first part of the model's validation was carried out to 
compare the performance using each of the ten stimula- 
tion sites separately. Following the scheme (Figure 3), each 
subject from the validation set V was sent as input. kNN 
was applied with k=5. The average prediction rate 
r was reported. Given (Nf,Nl),l= 1,6 training 
subjects, average prediction performance over 10 runs 
with TR[, t = 1, 10 at each site was reported. As there 
was only one set of (Nj,Nj) training subjects, only one 
prediction result was obtained for this case (see Table 2). 



The best performance at each site over different 
number of training subjects is displayed in Table 2. 
To decide which sites should be discarded, a threshold of 
75% for the prediction rate was used. From this, 3 sites 
were selected whereas the remaining 7 were discarded 
(see Figure l.B for an illustration of the locations of 
selected and eliminated sites). The final results on 
stimulation site evaluation suggested that signals 
recorded by stimulation of sites 3, 9 and 10 should 
be chosen for further evaluation. 

Site combination 

Following with the model's validation, the analysis was 
focused on the remaining 3 sites, namely sites {3, 9, 10}. 
Different combinations among those sites were also 
tested: {3, 9}, {3, 10}, {9, 10} and {3, 9, 10}. When more 
than one site was used, the voting mechanism was 
applied. Results are displayed in Table 3 for compari- 
sons between single site and combinations of sites. 
Results showed that in general, the prediction rates 
were improved by combining sites compared to using 
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Table 2 Average prediction rates at each site with different numbers of training subjects 





(12,12) 


(24,24) 


(32,32) 


(48,48) 


(60,60) 


(80,80) 


(100,100) 


Best performance 


Site 9 


68% 


68% 


73% 


73% 


75% 


77% 


85% 


85% 


Site 10 


64% 


66% 


69% 


66% 


67% 


68% 


80% 


80% 


->lLc j 


J / 70 


OZ/0 


DjTO 




1") OA 
/ Z70 


7^ OA 

1 370 


/o7o 


7QQA 

/O70 


Site 8 


57% 


65% 


61% 


66% 


67% 


71% 


63% 


71% 


Site 1 


53% 


59% 


62% 


61% 


65% 


69% 


70% 


70% 


Site 5 


51% 


56% 


58% 


61% 


60% 


69% 


65% 


69% 


Site 2 


56% 


63% 


61% 


62% 


62% 


64% 


68% 


68% 


Site 7 


59% 


59% 


58% 


64% 


63% 


61% 


63% 


64% 


Site 4 


50% 


53% 


52% 


59% 


54% 


58% 


63% 


63% 


Site 6 


51% 


62% 


55% 


60% 


59% 


58% 


53% 


62% 



Sites are sorted based on best performance in the last column. 



healthy subjects and the prediction rate for patients, 
respectively. From previous validation results, a com- 
bination of EMG signals recorded after stimulation of 
sites 9 and 10 was used. Figure 4 shows the predic- 
tion result for each case, demonstrating that even 
with only 12 training subjects for each group, a well 
selected subset can reach up to 80% correct classified 
on average (Figure 4.A, t = 7). It was also observed 
that some of the training subsets did not perform 
very well, with prediction rates lower than 60%. In 
general, higher number of training subjects (48, 60 and 80, 
Figures 4.D, 4.E and 4.F, respectively) resulted in an im- 
provement in the performance over lower number of train- 
ing subjects (12, 24 and 32, Figures 4.A, 4.B and 4.C, 
respectively). With higher number of training subjects 
(48, 60 and 80), performances among different subsets 
were rather comparable. In case of (N^,N P 7 ) with t=l 
(only one training set), results were 75%, 95% and 85% for 
r p , r h and r, respectively. 

The case with highest prediction rate was selected 
for each set (Ni,N p ), As mentioned before, balance 
between r h and r p is also important. This means that 



Table 3 Average prediction rates with different numbers of training subjects 





(12,12) 


(24,24) 


(32,32) 


(48,48) 


(60,60) 


(80,80) 


(100,100) 


Best performance 


Sites {9,10} 


67% 


70% 


73% 


73% 


77% 


77% 


85% 


85% 


Sites {3,9,10} 


64% 


68% 


67% 


74% 


77% 


79% 


85% 


85% 


Site 9 


68% 


68% 


73% 


73% 


75% 


77% 


85% 


85% 


Sites {3,10} 


64% 


69% 


69% 


72% 


78% 


80% 


80% 


80% 


Site 10 


64% 


66% 


69% 


66% 


67% 


68% 


80% 


80% 


Sites {3,9} 


67% 


67% 


70% 


76% 


77% 


77% 


78% 


78% 


Site 3 


57% 


62% 


63% 


69% 


72% 


75% 


78% 


78% 



a single site. Overall, the best performance (85% cor- 
rect predictions) was reached by a combination of 
EMG signals recorded after stimulation of sites {9, 
10}, {3, 9, 10} and {9}. The combination {9, 10} was 
finally chosen because it gave the best average predic- 
tion with different numbers of training subjects. 

Training set selection 

In general, higher numbers of training samples do not 
always lead to better training model [15], since at a cer- 
tain point the model will not improve further or even 
drop down in performance. To find an optimal number 
of training samples, an empirical approach is often used 
[24,26]. The performance of the prediction model was 
influenced by the number of training subjects, as shown 
in Tables 2 and 3. Moreover, with the same number of 
training subjects (N* ,N p ) ,1= 1, 6, different subset TRj, 
t = 1, 10 also gave different prediction rates. With each 
set (N^,N p ),l= 1,6, three values were reported for a 
training set TR l tl t = 1,10: r, and r p , representing 
the average prediction rate, the prediction rate for 



Prediction using four combinations of site were reported and compared to the classification performance of each of the three selected sites separately. Sites are 
sorted based on best performances displayed in the last column (when best performance is similar, sites are sorted based on average performance). 
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Figure 4 Comparison of prediction rates for each set of training subjects. Panels A to F show the average prediction performance of ten 
runs for each set (Nf 1 , Nf) , I = 1,6, respectively. 
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the selected model should have a high average predic- 
tion rate r, and at the same time, the difference be- 
tween r h and r p should be small. For instance, in 
Figure 4.B, at t = 10, the average prediction rate was r = 
72% but difference between = 50% and r p = 95% was 
large, i.e., most of the subjects were predicted as patients. 
Therefore, this model should not be chosen. In Figure 4.F, 
average prediction rates at t = 3 and t = 9 were equal, but 
case t = 9 was selected because Hr^-r^H was smaller 
than in case t = 3. Comparison among best prediction 
rates with different numbers of training subjects is 
displayed in Table 4. The results show a tight compe- 
tition between 3 sets (N h 5 ,N p 5 ) , {N h 6 ,N p 6 ) and 
(Nj,Nj) with average rates of 85%. However, with 
the same premise discussed above, (N^N^) and 
(ATg,iVg) were better choices than (Nj,Nj) . Using 
these two sets yielded more comparable performance 
between patient and healthy group. Finally, (N^Nq) 
was selected with t = 9 as the training set for the 
proposed model. 



Model evaluation 

The model validation stage was used to determine opti- 
mal parameters for the prediction model, resulting in a 
combination of EMG signals recorded after stimulation 
of sites 9 and 10 and 80 training subjects for each group 
(t = 9). The optimized model was tested with the test set 
TE. The test set contained 40 subjects (20 patients and 
20 healthy volunteers). Following the scheme in Figure 3, 
the evaluation with the test set returned an average pre- 
diction r = 80% with r h = r p = 80%. This means that with 
40 query subjects, 8 were misclassified, in which 4 
healthy subjects were misclassified as patients and 4 pa- 
tients were misclassified as healthy subjects. Since the 
test set was evenly distributed with respect to gender 
and age, misclassified subjects were grouped based on 
these two factors to see how they affected prediction 
results. The following values were computed: 



Table 4 Comparison of the best performances between 
different numbers of training subjects 
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Results revealed differences among misclassified subject 
based on both factors. There were 7 female subjects 
misclassified (u Gl = 87.5%). The age factor also showed ra- 
ther strong influence on the prediction result: v Al =75% 
meaning that 6 out of 8 misclassified subjects were from 
the elder age group (age > 50 years). 

Comparison with current methods 

Critical values to assess widespread central hyperexcitability 
using the NWR and RRF have recently been published 
[10]. In particular, estimates of 95 th , 90 th and 75 th percentile 
values of the distribution of the test responses have been 
obtained (named p , p and p , respectively) by comput- 
ing quantile regressions for the each assessment method 
(e.g. NWR thresholds, RRF areas). More extreme values 
(e.g. p 95 ) are more likely to lead to the correct identification 
of patients, but could leave out a number of subjects that 
could also potentially be at risk, whereas critical values that 
correspond to more central percentiles of the distribution 
(e.g. p 75 ) would include more subjects potentially at risk, 
but at the cost of misclassifying more healthy subjects as 
presenting hyperexcitability. 

With the present dataset, it was possible to compute 
individual RRF areas for chronic pain patients and healthy 
subjects (for details, please refer to [27]), and compare 
them to these critical values to obtain equivalent classifi- 
cation rates r p , r h and r using the same test set TE. The 
classification rates for the most restrictive conditions 
(p 95 and p 90 ) were r p = 0%, = 100% and r = 50%, whereas 
for the least restrictive condition (p 75 ) the classification 
rates were r p = 20%, r/, = 95% and r = 57.5%. This means 
that the distribution of RRF areas of patients and healthy 
controls largely overlap resulting in criteria that is too 
restrictive for detecting central hyperexcitability, as evidenced 
by the very low classification rates for patients obtained 
with this method. 

Discussion 

Experimental and clinical studies in diverse cohorts 
of patients (e.g., whiplash, fibromyalgia, osteoarthritis, 
musculoskeletal disorders, headache, and neuropathic, 
visceral and post-surgical pain) have shown that 
these pathologies share common features, which are 
likely to reflect alterations in central nociceptive pro- 
cessing [28,29] leading to exaggerated pain sensitivity. 
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It has been previously established that changes in central 
nociceptive processing can be detected by electrophysio- 
logical tests, such as those based on the NWR. In the past, 
the NWR has been widely used as a biomarker in the 
assessment of the state of the nociceptive system [5,6,30], 
and it has been proposed as a key tool in the research of 
central sensitization mechanisms, which are believed to be 
linked to the development of chronic pain [5,7,8,31,32]. 

In this regard, a number of studies showed that several 
patient groups present lower NWR thresholds compared 
to control groups of healthy volunteers [7,8,33,34]. More- 
over, it was also demonstrated that chronic pain patients 
(endometriosis, chronic low back and chronic neck pain) 
present larger RRF compared to pain free subjects [12,31]. 
Lower NWR thresholds and enlargement of RRF are 
objective signs of central hyperexcitability, which could be 
a consequence of increased number of responsive spinal 
neurons or an expansion of the receptive fields of spinal 
neurons as a result of increased synaptic sensitivity 
[35,36]. In the light of these facts, there is clear evidence 
that groups of patients with different chronic pain condi- 
tions display on average altered central pain processing. 
However, the next translational step in this field involves 
the definition of diagnostic criteria in individual patients, 
in order to develop treatments that are tailored to detect 
individual disturbances in central pain processing [29] . 

In this study, a set of features derived from the NWR 
of chronic low back and neck pain patients and healthy 
volunteers was used as input to a prediction model, in 
order to test the hypothesis that the NWR contains 
specific information that would allow individual clas- 
sification regarding the condition of the test subjects. 
Several features derived from the NWR have been 
used in the past for detection or quantification pur- 
poses: NWR latencies, raw EMG amplitudes, mean 
and peak EMG values, EMG probability distribution, 
EMG root-mean-square (RMS), z-scores and RRF area 
size, among others [6,27,32,37-41]. Additionally, other 
EMG features have also been used in classifications 
tasks in other fields (most notably myoelectric control 
systems), such as number of zeros crossings, slope 
sign changes, spectral moments, as well as frequency 
domain and time-scale features [42-44]. 

A preliminary analysis showed that, among these vari- 
ables, EMG probability distributions showed the most 
promising results in terms of discriminating potential 
for classifying between patients and healthy volunteers. 
Thus, they were selected for further development of the 
prediction model. However, the EMG signals showed a 
rather large range, thus requiring a high number of bins 
for their histogram representation. In order to overcome 
this, a new range was defined, restricting the original 
range around one standard deviation of the mean, In 
this way, less bins were required for the representation 



(as a simple method of feature selection), effectively re- 
ducing the number of features to be fed to the predic- 
tion model. 

The evaluation of stimulation sites for eliciting the 
NWR revealed that EMG signals recorded after stimulation 
of electrodes located in the anterolateral (site 3) and heel 
(sites 9 and 10) regions and of the sole of the foot 
presented the best prediction rates. This is in accordance 
with previous research showing that the RRF in chronic 
low back and neck pain patients are expanded compared 
to healthy volunteers, precisely towards these regions 
[31,45]. On the other hand, EMG signals recorded after 
stimulation of sites located at or around the arch on the 
sole of the foot resulted in the worst prediction rates. 
These locations often have thin skin layer which lead to 
higher pain sensitivity and large reflexes regardless of 
whether they are patients or healthy subjects [46]. 

The final model evaluation showed an average predic- 
tion rate of 80%. For that particular choice of model, 
there were no differences in the misclassification rates 
between healthy subjects and patients. A more detailed 
analysis of the results focusing on the demographics of 
the two groups, revealed that women and elder subjects 
are more likely to be misclassified using the selected 
model. To date, there are no studies describing age or 
gender differences in EMG signals recorded from 
chronic pain patients compared healthy volunteers in re- 
lation to the NWR, since most of the research is focused 
on other biomarkers, most notably the NWR thresholds 
to single and repeated stimulation (temporal summa- 
tion), and the RRF areas [9,26,32]. In this regard, there is 
still no agreement on the effects of age and gender on 
the NWR, although most of the evidence seems to point 
towards generally lower NWR thresholds in women and 
elderly subjects, most likely due to reduced endogenous 
analgesic mechanisms [6,10,47,48]. 

To date, only population differences have been reported 
between chronic low back and neck pain patients 
compared to healthy volunteers, showing an enlargement 
of the RRF in patients [12]. More recently, however, refer- 
ence ranges for the NWR and RRF have been established 
for healthy subjects [10]. These ranges establish critical 
values for several parameters derived from the NWR 
(e.g. NWR threshold to single and repeated electrical 
stimulation, RRF area), above which an individual subject 
can be considered to present widespread central hyper- 
excitability. Results using this method with RRF areas as 
the classification parameters showed lower average classi- 
fication rates (r = 57.5%) and very low classification rates 
for patients {r p = 20%) compared to the prediction model. 
This is most likely due to the large inter-individual 
variation of the RRF areas and the high overlap that 
exists between the probability distribution of RRF areas in 
patients and healthy subjects. 
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Limitations and future work 

This work focused on the assessment of central 
hyperexcitability in individual chronic low back and 
neck pain patients using the NWR. Although the 
quantification of the NWR does not rely on subjective 
self-reports of pain sensation, it is subjected to supraspinal 
modulation. External factors involving affective and cogni- 
tive processes or other ongoing nociceptive processes 
(e.g. endogenous pain modulatory mechanisms) can 
affect the NWR characteristics [6,24], so these factors 
have to be carefully controlled for in order to provide 
reliable outcomes. Further tests in other patient groups 
should be conducted in order to test if this model could 
also be used to characterize other pain conditions. 

Furthermore, this is the first known attempt at 
individualized classification between healthy subjects 
and chronic pain patients based on the assessment of 
central hypersensitivity provided by the NWR. As it 
is common in classification tasks, there are several 
variables that require a careful selection, such as the 
choice of features to be used as input to the predic- 
tion model (in this case, the EMG probability distri- 
bution), the parameters of the classifier (for kNN, 
the number of neighbours), the size of the datasets 
for classification, validation and test, and the number 
and location of stimulation sites selected. Some of 
these variables were chosen based on prior know- 
ledge and/or empirical tests, so whereas the proposed 
statistical model is able to achieve high prediction 
rates, future research could focus on the application 
of more advanced signal processing methods, e.g. 
alternative methods for feature generation and selec- 
tion, adaptive histograms, adaptive kernel density estima- 
tors and optimal parameter selection for the classifier, 
among others. 

Conclusions 

A prediction model was proposed as a new approach 
for objective and individual assessment of central 
hyperexcitability in the nociceptive system. The 
model was developed using statistical properties of 
EMG signals recorded after eliciting the nociceptive 
withdrawal reflex. The model supports individualized 
assessment of patients, including an estimation of the 
confidence of the predicted result. Evaluation was 
carried out using an independent test set of healthy 
subjects and chronic pain patients and a high prediction 
rate of 80% was achieved. Therefore, the present statistical 
prediction model constitutes a first step towards potential 
applications in clinical practice. 
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